113 research outputs found
Bayesian survival modelling of university outcomes
Dropouts and delayed graduations are critical issues in higher education systems world wide. A key task in this context is to identify risk factors associated with these events, providing potential targets for mitigating policies. For this, we employ a discrete time competing risks survival model, dealing simultaneously with university outcomes and its associated temporal component. We define survival times as the duration of the student's enrolment at university and possible outcomes as graduation or two types of dropout (voluntary and involuntary), exploring the information recorded at admission time (e.g. educational level of the parents) as potential predictors. Although similar strategies have been previously implemented, we extend the previous methods by handling covariate selection within a Bayesian variable selection framework, where model uncertainty is formally addressed through Bayesian model averaging. Our methodology is general; however, here we focus on undergraduate students enrolled in three selected degree programmes of the Pontificia Universidad Católica de Chile during the period 2000–2011. Our analysis reveals interesting insights, highlighting the main covariates that influence students’ risk of dropout and delayed graduation
BASiCS: Bayesian Analysis of Single-Cell Sequencing Data
Single-cell mRNA sequencing can uncover novel cell-to-cell heterogeneity in gene expression levels in seemingly homogeneous populations of cells. However, these experiments are prone to high levels of unexplained technical noise, creating new challenges for identifying genes that show genuine heterogeneous expression within the population of cells under study. BASiCS (Bayesian Analysis of Single-Cell Sequencing data) is an integrated Bayesian hierarchical model where: (i) cell-specific normalisation constants are estimated as part of the model parameters, (ii) technical variability is quantified based on spike-in genes that are artificially introduced to each analysed cell's lysate and (iii) the total variability of the expression counts is decomposed into technical and biological components. BASiCS also provides an intuitive detection criterion for highly (or lowly) variable genes within the population of cells under study. This is formalised by means of tail posterior probabilities associated to high (or low) biological cell-to-cell variance contributions, quantities that can be easily interpreted by users. We demonstrate our method using gene expression measurements from mouse Embryonic Stem Cells. Cross-validation and meaningful enrichment of gene ontology categories within genes classified as highly (or lowly) variable supports the efficacy of our approach
Incorporating unobserved heterogeneity in Weibull survival models: A Bayesian approach
Outlying observations and other forms of unobserved heterogeneity can distort inference for survival datasets. The family of Rate Mixtures of Weibull distributions includes subject-level frailty terms as a solution to this issue. With a parametric mixing distribution assigned to the frailties, this family generates flexible hazard functions. Covariates are introduced via an Accelerated Failure Time specification for which the interpretation of the regression coefficients does not depend on the choice of mixing distribution. A weakly informative prior is proposed by combining the structure of the Jeffreys prior with a proper prior on some model parameters. This improper prior is shown to lead to a proper posterior distribution under easily satisfied conditions. By eliciting the proper component of the prior through the coefficient of variation of the survival times, prior information is matched for different mixing distributions. Posterior inference on subject-level frailty terms is exploited as a tool for outlier detection. Finally, the proposed methodology is illustrated using two real datasets, one concerning bone marrow transplants and another on cerebral palsy
Deep generative modeling for single-cell transcriptomics.
Single-cell transcriptome measurements can reveal unexplored biological diversity, but they suffer from technical noise and bias that must be modeled to account for the resulting uncertainty in downstream analyses. Here we introduce single-cell variational inference (scVI), a ready-to-use scalable framework for the probabilistic representation and analysis of gene expression in single cells ( https://github.com/YosefLab/scVI ). scVI uses stochastic optimization and deep neural networks to aggregate information across similar cells and genes and to approximate the distributions that underlie observed expression values, while accounting for batch effects and limited sensitivity. We used scVI for a range of fundamental analysis tasks including batch correction, visualization, clustering, and differential expression, and achieved high accuracy for each task
Single-nucleus RNA-seq2 reveals functional crosstalk between liver zonation and ploidy.
Funder: Cancer Research UKSingle-cell RNA-seq reveals the role of pathogenic cell populations in development and progression of chronic diseases. In order to expand our knowledge on cellular heterogeneity, we have developed a single-nucleus RNA-seq2 method tailored for the comprehensive analysis of the nuclear transcriptome from frozen tissues, allowing the dissection of all cell types present in the liver, regardless of cell size or cellular fragility. We use this approach to characterize the transcriptional profile of individual hepatocytes with different levels of ploidy, and have discovered that ploidy states are associated with different metabolic potential, and gene expression in tetraploid mononucleated hepatocytes is conditioned by their position within the hepatic lobule. Our work reveals a remarkable crosstalk between gene dosage and spatial distribution of hepatocytes
Normalizing single-cell RNA sequencing data: challenges and opportunities
Single-cell transcriptomics is becoming an important component of the molecular biologist's toolkit. A critical step when analyzing data generated using this technology is normalization. However, normalization is typically performed using methods developed for bulk RNA sequencing or even microarray data, and the suitability of these methods for single-cell transcriptomics has not been assessed. We here discuss commonly used normalization approaches and illustrate how these can produce misleading results. Finally, we present alternative approaches and provide recommendations for single-cell RNA sequencing users
Recommended from our members
High-Sensitivity Cardiac Troponin and the Universal Definition of Myocardial Infarction.
Background: The introduction of more sensitive cardiac troponin assays has led to increased recognition of myocardial injury in acute illnesses other than acute coronary syndrome. The Universal Definition of Myocardial Infarction recommends high-sensitivity cardiac troponin (hs-cTn) testing and classification of patients with myocardial injury based on aetiology, but the clinical implications of implementing this guideline are not well understood. Methods: In a stepped-wedge cluster randomized controlled trial, we implemented a hs-cTn assay and the recommendations of the Universal Definition in 48,282 consecutive patients with suspected acute coronary syndrome. In a pre-specified secondary analysis, we compared the primary outcome of myocardial infarction or cardiovascular death and secondary outcome of non-cardiovascular death at one year across diagnostic categories. Results: Implementation increased the diagnosis of type 1 myocardial infarction by 11% (510/4,471), type 2 myocardial infarction by 22% (205/916), and acute and chronic myocardial injury by 36% (443/1,233) and 43% (389/898), respectively. Compared to those without myocardial injury, the rate of the primary outcome was highest in those with type 1 myocardial infarction (cause-specific hazard ratio [csHR] 5.64, 95% confidence interval [CI] 5.12 to 6.22), but was similar across diagnostic categories, whereas non-cardiovascular deaths were highest in those with acute myocardial injury (csHR 2.65, 95%CI 2.33 to 3.01). Despite modest increases in anti-platelet therapy and coronary revascularization after implementation in patients with type 1 myocardial infarction, the primary outcome was unchanged (csHR 1.00, 95%CI 0.82 to 1.21). Increased recognition of type 2 myocardial infarction and myocardial injury did not lead to changes in investigation, treatment or outcomes. Conclusions: Implementation of high-sensitivity cardiac troponin and the recommendations of the Universal Definition of Myocardial Infarction identified patients at high-risk of cardiovascular and non-cardiovascular events, but was not associated with consistent increases in treatment or improved outcomes. Trials of secondary prevention are urgently required to determine whether this risk is modifiable in patients without type 1 myocardial infarction. Clinical Trial Registration: URL: https://clinicaltrials.gov Unique Identifier: NCT0185212
Aging increases cell-to-cell transcriptional variability upon immune stimulation
Aging is characterized by progressive loss of physiological and cellular functions, but the molecular basis of this decline remains unclear. We explored how aging affects transcriptional dynamics using single-cell RNA sequencing of unstimulated and stimulated naïve and effector memory CD4(+) T cells from young and old mice from two divergent species. In young animals, immunological activation drives a conserved transcriptomic switch, resulting in tightly controlled gene expression characterized by a strong up-regulation of a core activation program, coupled with a decrease in cell-to-cell variability. Aging perturbed the activation of this core program and increased expression heterogeneity across populations of cells in both species. These discoveries suggest that increased cell-to-cell transcriptional variability will be a hallmark feature of aging across most, if not all, mammalian tissues.Funded by the European Research Council (F.C., T.F.R., D.T.O., S.A.T., and M.J.T.S.), EMBO Young Investigators Programme (D.T.O.), Cancer Research UK (H.-C.C., M.d.l.R., D.T.O., and J.C.M.), Janet Thornton Fellowship (WT098051 to C.P.M.-J.), Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (107609/Z/15/Z to M.d.l.R.), European Molecular Biology Laboratory (N.E., A.A.K., M.J.T.S., S.A.T., and J.C.M.), Medical Research Council Biostatistics Unit (MRC_MC_UP_0801/1 to C.A.V.), WTSI (C.P.M.-J., S.A.T., J.C.M., and D.T.O.), and Biotechnology and Biological Sciences Research Council–Collaborative Awards in Science and Engineering Studentship with Abcam plc (A.A.K.)
Beyond comparisons of means: understanding changes in gene expression at the single-cell level
Traditional differential expression tools are limited to detecting changes in overall expression, and fail to uncover the rich information provided by single-cell level data sets. We present a Bayesian hierarchical model that builds upon BASiCS to study changes that lie beyond comparisons of means, incorporating built-in normalization and quantifying technical artifacts by borrowing information from spike-in genes. Using a probabilistic approach, we highlight genes undergoing changes in cell-to-cell heterogeneity but whose overall expression remains unchanged. Control experiments validate our method’s performance and a case study suggests that novel biological insights can be revealed. Our method is implemented in R and available at https://github.com/catavallejos/BASiCS
Extension of the core map of common bean with EST-SSR, RGA, AFLP, and putative functional markers
Microsatellites and gene-derived markers are still underrepresented in the core molecular linkage map of common bean compared to other types of markers. In order to increase the density of the core map, a set of new markers were developed and mapped onto the RIL population derived from the ‘BAT93’ × ‘Jalo EEP558’ cross. The EST-SSR markers were first characterized using a set of 24 bean inbred lines. On average, the polymorphism information content was 0.40 and the mean number of alleles per locus was 2.7. In addition, AFLP and RGA markers based on the NBS-profiling method were developed and a subset of the mapped RGA was sequenced. With the integration of 282 new markers into the common bean core map, we were able to place markers with putative known function in some existing gaps including regions with QTL for resistance to anthracnose and rust. The distribution of the markers over 11 linkage groups is discussed and a newer version of the common bean core linkage map is proposed
- …